Data Mining And Data Warehousing
Association Analysis
Association Analysis
Association analysis is the process of finding relation or association between two or more
objects or between two or more attributes of objects. Association analysis has a very good application in the
business; for example, in a department store it is always interesting to know which two or more products
are purchased at the same time by most of the customers; In marketing this is known as Market
basket Analysis. Customers use baskets while shopping; they put different products in the basket; What
are the items most of the customers put in the basket while purchasing is interesting information for the
store also. The store can optimize its store accordingly by placing the related products in close proximity such that
customers don't have to walk in the store to find products of their need; the store can manage its inventory
according to the sales of related items.
Several algorithms are proposed for association analysis; two most important are:
Aprior Algorithm
Aprior Algorithm is most suitable for market basket analysis. It takes transactional database as input;
and produces interesting association between the products, objects.
Minimum Support and
Minimum Confidence
for the association between objects or patterns have to be specified in this algorithm.
| Minimum Confidence (A --> B) = |
| #_tuples_containing_A_and_B |
| #_tuples_containing_A |
| Minimum Support (A --> B) = |
| #_tuples_containing_A_and_B |
| Total_#_tuples |
Apriori Algorithm:
- Consider the dataset, D, of transactions, specify required minimum support of each item or combination of items
- Arrange the objects in ascending order (A, B, C) or (I1, I2, I3)
- Get a set of single items and name it as L0
- Find the frequency (count) of each item in the dataset, D
- Find support for each item
- Find the items which satisfy the minimum support
- Make a new set of those items which satisfy minimum support as L1
- Take cartesian product of set L1 to get set of two items,L2
- Find frequency and support of each item in the set L2
- Find the two items which satisfy minimum support
- Make set of those two items which satisfy minimum support, L3
- In this way make set of different number of items, find frequency and calculate support,
check if the items satisfy minimum support or not. Those which satisfy minimum support and
interesting pattern and considered for further pattern generation, those which don't satisfy minimum
support are discarded from further consideration.
Example: Finding Interesting Association between products
Minimum Support = 25%. Total Number of records = 10
| Transaction ID | Items ID |
| T1 | A, B, C, D |
| T2 | A, C |
| T3 | C, A |
| T4 | A, C, D |
| T5 | B, D |
| T6 | C, B, D |
| T7 | A, D |
| T8 | B, D |
| T9 | B, C |
| T10 | A, B, C |
Single Item Set, L0
Single item with their frequency or count
| Items | Frequency |
| A | 6 |
| B | 6 |
| C | 7 |
| D | 6 |
All of the items support minimum support of 20%, so, single items are interesting. Now get set of two items, L1,
by taking cartesian product of L0 with L0.
| Items |
| A, B |
| A, C |
| A, D |
| B, C |
| B, D |
| C, D |
Get frequency of two items shown in the table L2
| Items | Frequency | Support |
| A, B | 2 | 2/10=20% |
| A, C | 5 | 5/10=50% |
| A, D | 3 | 3/10=30% |
| B, C | 3 | 3/10=30% |
| B, D | 4 | 4/10=40% |
| C, D | 3 | 3/10=30% |
Here, interesting two items are: (A, C), (A, D), (B,C), (B,D), and (C,D) as all of these have support greater than minimum
support of 25%.
To find interesting three items, take cartesian product of L1 with L1 and repeat above steps
Association Analysis
Association Analysis
Association analysis is a descriptive data mining technique used to discover interesting relationships (associations) or patterns among items in large datasets. It's most commonly used in market basket analysis to find products that are frequently bought together.
| Term | Meaning |
| Itemset | A collection of one or more items (e.g., {Chips, Coca_cola}) |
| Support | How often an itemset appears in the dataset |
| Confidence | How often the rule is true (e.g., if X is bought, how often is Y too?) |
| Lift | How much more likely Y is bought when X is bought, compared to random chance |
Rule Example:
Rule: {Chips, Coca_cola} --> {Coca_cola}
- Support: % of transactions that include Milk, Bread, and Butter
- Confidence: % of transactions with Milk & Bread that also include Butter
- Lift: Confidence divided by the expected confidence (independence)